Dependency Annotation of Coordination for Learner Language
نویسندگان
چکیده
We present a strategy for dependency annotation of corpora of second language learners, dividing the annotation into different layers and separating linguistic constraints from realizations. Specifically, subcategorization information is required to compare to the annotation of realized dependencies. Building from this, we outline dependency annotation for coordinate structures, detailing a number of constructions such as right node raising and the coordination of unlikes. We conclude that branching structures are preferable to treating the conjunct as the head, as this avoids duplicating annotation.
منابع مشابه
Dependency Annotation for Learner Corpora
Building from the CHILDES dependency annotation scheme and on interlanguage POS annotation, we describe a syntactic annotation scheme developed for the data of second language learners. We encode subcategorization frames and underlying dependencies, in addition to the usual surface dependencies. The annotation scheme is relatively independent of language and can be mapped to learner errors.
متن کاملInter-annotator Agreement for Dependency Annotation of Learner Language
This paper reports on a study of interannotator agreement (IAA) for a dependency annotation scheme designed for learner English. Reliably-annotated learner corpora are a necessary step for the development of POS tagging and parsing of learner language. In our study, three annotators marked several layers of annotation over different levels of learner texts, and they were able to obtain generall...
متن کاملAn annotation scheme for Persian based on Autonomous Phrases Theory and Universal Dependencies
A treebank is a corpus with linguistic annotations above the level of the parts of speech. During the first half of the present decade, three treebanks have been developed for Persian either originally or subsequently based on dependency grammar: Persian Treebank (PerTreeBank), Persian Syntactic Dependency Treebank, and Uppsala Persian Dependency Treebank (UPDT). The syntactic analysis of a sen...
متن کاملThe Effect of Annotation Scheme Decisions on Parsing Learner Data
We present a study on the dependency parsing of second language learner data, focusing less on the parsing techniques and more on the effect of the linguistic distinctions made in the data. In particular, we examine syntactic annotation that relies more on morphological form than on meaning. We see the effect of particular linguistic decisions by: 1) converting and transforming a training corpu...
متن کاملREALEC learner treebank: annotation principles and evaluation of automatic parsing
The paper presents a Universal Dependencies (UD) annotation scheme for a learner English corpus. The REALEC dataset consists of essays written in English by Russian-speaking university students in the course of general English. The original corpus is manually annotated for learners’ errors and gives information on the error span, error type, and the possible correction of the mistake provided b...
متن کامل